LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems
نویسندگان
چکیده
Matrix factorization (MF) plays a key role in many applications such as recommender systems and computer vision, but MF may take long running time for handling large matrices commonly seen in the big data era. Many parallel techniques have been proposed to reduce the running time, but few parallel MF packages are available. Therefore, we present an open source library, LIBMF, based on recent advances of parallel MF for sharedmemory systems. LIBMF includes easy-to-use command-line tools, interfaces to C/C++ languages, and comprehensive documentation. Our experiments demonstrate that LIBMF outperforms state of the art packages. LIBMF is BSD-licensed, so users can freely use, modify, and redistribute the code.
منابع مشابه
Nonnegative Matrix Factorization via Newton Iteration for Shared-memory Systems∗
Nonnegative Matrix Factorization (NMF) can be used to approximate a large nonnegative matrix as a product of two smaller nonnegative matrices. This paper shows in detail how an NMF algorithm based on Newton iteration can be derived utilizing the general Karush-KuhnTucker (KKT) conditions for first-order optimality. This algorithm is suited for parallel execution on shared-memory systems. It was...
متن کاملSolving linear systems with vectorized WZ factorization
Abstract In the paper we present a vectorized algorithm for WZ factorization of a matrix which was implemented with the BLAS1 library. We present the results of numerical experiments which show that vectorization accelerates the sequential WZ factorization. Next, we parallelized both algorithms for a two-processor shared memory machine using the OpenMP standard. We present performances of these...
متن کاملTall and Skinny QR Matrix Factorization Using Tile Algorithms on Multicore Architectures
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on ...
متن کاملDeveloping a High Performance Software Library with MPI and CUDA for Matrix Computations
Nowadays, the paradigm of parallel computing is changing. CUDA is now a popular programming model for general purpose computations on GPUs and a great number of applications were ported to CUDA obtaining speedups of orders of magnitude comparing to optimized CPU implementations. Hybrid approaches that combine the message passing model with the shared memory model for parallel computing are a so...
متن کاملDuctTeip: A TASK-BASED PARALLEL PROGRAMMING FRAMEWORK FOR DISTRIBUTED MEMORY ARCHITECTURES∗
Current high-performance computer systems used for scientific computing typically combine shared memory compute nodes in a distributed memory environment. Extracting high performance from these complex systems requires tailored approaches. Task based parallel programming has been successful both in simplifying the programming and in exploiting the available hardware parallelism. We have previou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 17 شماره
صفحات -
تاریخ انتشار 2016